23 research outputs found
Too Few Bug Reports? Exploring Data Augmentation for Improved Changeset-based Bug Localization
Modern Deep Learning (DL) architectures based on transformers (e.g., BERT,
RoBERTa) are exhibiting performance improvements across a number of natural
language tasks. While such DL models have shown tremendous potential for use in
software engineering applications, they are often hampered by insufficient
training data. Particularly constrained are applications that require
project-specific data, such as bug localization, which aims at recommending
code to fix a newly submitted bug report. Deep learning models for bug
localization require a substantial training set of fixed bug reports, which are
at a limited quantity even in popular and actively developed software projects.
In this paper, we examine the effect of using synthetic training data on
transformer-based DL models that perform a more complex variant of bug
localization, which has the goal of retrieving bug-inducing changesets for each
bug report. To generate high-quality synthetic data, we propose novel data
augmentation operators that act on different constituent components of bug
reports. We also describe a data balancing strategy that aims to create a
corpus of augmented bug reports that better reflects the entire source code
base, because existing bug reports used as training data usually reference a
small part of the code base
Mining Sequences of Developer Interactions in Visual Studio for Usage Smells
In this paper, we present a semi-automatic approach for mining a large-scale dataset of IDE interactions to extract usage smells, i.e., inefficient IDE usage patterns exhibited by developers in the field. The approach outlined in this paper first mines frequent IDE usage patterns, filtered via a set of thresholds and by the authors, that are subsequently supported (or disputed) using a developer survey, in order to form usage smells. In contrast with conventional mining of IDE usage data, our approach identifies time-ordered sequences of developer actions that are exhibited by many developers in the field. This pattern mining workflow is resilient to the ample noise present in IDE datasets due to the mix of actions and events that these datasets typically contain. We identify usage patterns and smells that contribute to the understanding of the usability of Visual Studio for debugging, code search, and active file navigation, and, more broadly, to the understanding of developer behavior during these software development activities. Among our findings is the discovery that developers are reluctant to use conditional breakpoints when debugging, due to perceived IDE performance problems as well as due to the lack of error checking in specifying the conditional
How the Sando Search Tool Recommends Queries
Developers spend a significant amount of time searching their local codebase.
To help them search efficiently, researchers have proposed novel tools that
apply state-of-the-art information retrieval algorithms to retrieve relevant
code snippets from the local codebase. However, these tools still rely on the
developer to craft an effective query, which requires that the developer is
familiar with the terms contained in the related code snippets. Our empirical
data from a state-of-the-art local code search tool, called Sando, suggests
that developers are sometimes unacquainted with their local codebase. In order
to bridge the gap between developers and their ever-increasing local codebase,
in this paper we demonstrate the recommendation techniques integrated in Sando